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ABSTRACT 

The L statistic of E. B. Page (1963) tests the 
agreement of a single group of judges with an a priori ordering of 
alternative treatments. This paper extends the two group test of D. 
W. Leitner and C. M. Dayton (1976), an extension of the L test, to 
analyze difference in consensus between two unequally sized groups of 
judges. Exact critical values are tabulated for small numbers of 
treatments and judges, and a unit normal approximation is developed 
for larger samples. A brief computation example is also provided. 
Analysis of the two-tailed unit normal approximation shows that 
disparity between sample sizes often leads to underestimating the 
probability of a Type I error. However, as the number of treatments 
increases, the fit becomes better. This test, in comparison to two 
parametric competitors in a 2 by 4 mixed design, made fewer Type I 
errors when data were sampled from the normal, uniform, and 
exponential distribution, but it was also shown to be generally more 
conservative. The unit normal approximation is not severely affected 
by unequally sized groups having heterogeneous variances. The 
proposed test shows adequate power in conditions that do not favor 
parametric tests (i.e., interval level, normally distributed 
variables). Four tables present analysis results, and four figures 
illustrate the discussion. (SLD) 
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Abstract 



Page's (1963) L statistic tests the agreement of a singie group of 
judges with an a priori ordering of alternative treatments. This 
paper extends the two group test developed by Leitner & Dayton 
( 1976) to analyze the difference in consensus between two unequally 
sized groups of judges. Exact critical values are tabled for small 
numbers of treatments and judges; a unit normal approximation 
(zLB)is developed for larger samples. A brief computation example 
couched in terms of an educational study is also provided* Analysis 
of the two-tailed unit normal approximation shows that disparity 
between sample sizes often leads to underestimating the probability^ 
of committing a Type I error; however, as the number of treatments 
increases the fit becomes better. This test in comparison to two 
parametric competitors Li a 2 x 4 mixed design made fewer Type I 
errors when data were sampled from the normal, uniform, and 
exponential distribution, but it was also shown to be generally more 
conservative. Also in contrast to the parametric tests, the zlb is not 
severely affected by the unequally sized groups having 
heterogeneous variances. Furthermore, the proposed test shows 
adequate power in conditions that do not favor parametric tests (i.e*, 
interval level data; normally distributed variables). 
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Page's L rest provides a nonparatnetric statistical index for the 
agreement of a single group of n rankings to a set of k alternatives 
(i.e., treatments, objects) with the following null hypothesis: 

Ho: Ti = T2... = Tk (1) 
where Ti represents the mean, for the ith treatment. The most 

general alternative hypothesis is that at least one pair among the k 
alternatives (TO is not equal. However, in many experiments an a 
priori ordered outcome of the k treatments can be expected from 
theoretical considerations and is thus of scientific interest. In such a 
case, the L test can be used and the alternative hypothesis can be 
phrased as such: 

Ha: Ti< T2<... <Tk (2) 
with at least one strict inequality. Thus, the L test was specifically 
designed for use with data measured at or reduced to an ordinal 
level. This test of ordered hypotheses for multiple treatments is 
based on the linear contribution each ranking makes to the 
treatment sirni of squares (Page, 1963). It has also been shown to be 
equivalent to the average Spearman rank-order correlation between 
the a priori ordering and the group of n rankings (Lyerl , 1952; Page, 
1963). That is, the average of the n correlations between individual 
rankings and the a priori ordering is mathematically related to L. 
Furthermore, the L test has parametric analogs in experimental 
designs such as the randomized block design (Azzam, Awad, & Sarie, 
1987; Hollander, 1967) and the repeated measures or split-plot 
design (Siegel & Castellan, 1988). Thus, the L test is easily computed, 
but few extensions to more complicated designs have been 
developed. Therefore, this procedure has been of questionable 
versatility, although it has been shown to be extremely robust 
because of its relationship to the Spearman coefficient (Page, 1963). 
Hollander (1967) showed that the asymptotic power of Page's L test 
compared to the i-test is .714 for k=3 and .955 as k approaches 
infinity. 

In practice. Page's L test is used to statistically determine 
whether one group of n judges agrees with an a priori ordering of 
treatments. For example, this test could be used to examine whether 
four school board members (n=4) rank the importance of five 
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educational objectives (k==5) as defined by Bloom's Taxonomy of 
Educational Objectives (i.e., Bloom & Madaus, 1981). In Table 1, the 
hypothetical data are cast into two-way tables having n rows and k 
columns. Separately for each row (school board member), the 5 
objectives are ranked. Each column of treatments (TO are summed 
and weighted by the order of the treatment. Then these weighted 
components are summed to form L which can be represented with 
the following formula: 

k n 

L = Eii;Ti 

i-i H (3). 
Thus from the data in Table 1 (Panel A), L = 2 17 which is significant 
with p < .001 (see Page, 1963 for tables). Ther^ore, the null 
hypothesis stating that the mean ranking of the Ti's are equal across 
the k objectives was rejected and the alternative h>T30thesis stating 
that the school board members ranked at least one of the objectives 
in the a priori ordering was accepted. 

If a second group, say concerned parents, were considered; two 
questions can be asked. First, to what degree does the consensus of 
the second group of judges (parents) agree with the a priori ranking 
of objectives? Second, do the two groups differ in the degree of their 
consensus with the ordered hypotheses. The first question can be 
answered by Page's L test applied to the second group of judges. 
Table 1 (Panel B) shows another hypothetical example using 4 
parents ranking of the 5 objectives. The result (L « 185) was not 
significant at the .05 alpha level. Thus, there is not sufficient 
evidence to reject the null hypothesis that the 5 objectives were 
ranked equally. 

The second question can be answered by an extension of Page's 
L, the Ld test (Leitner & Dayton, 1976) which is defined as the 
absolute value of the differences between L's for the two groups with 
the following nuU hypothesis: 

Ho: Li - L2 = 0 or Ho: Ld = 0. (4) 
The analysis for these two groups yields a results, Ld ==(217-185) = 
32, significant at the .05 alpha level (see Leitner & Dayton, 1976 for 
tables). Thus, the two groups (school board members and parents) 
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significantly differ in their degree of consensus to the a priori 
ordering of objectives as defined by Bloom's taxonomy. Furthermore, 
since school board members have a larger L statistic than do parents, 
they also have a significantly higher amount of agreement to the 
hypothesized ordering than do the parents. In certain situations, the 
Ld test of no differences in L statistics can be viewed as analogous to 
testing the linear trend interaction in a 2 by k mixed ANOVA. This 
test also assumes that Ld becomes large up to a maximum value 
when tiie groups disagree and that it is symmetric around zero. 
Thus, although Leitner and Dayton defined the Ld test as an absolute 
value, directional tests are possible. However, the symmetric 
conditions for the null hypothesis do not hold for unequal n's, which 
limits analyses to experiments with groups of equal sample size. In 
the example given, school board members, as opposed to concerned 
parents, are relatively rare and would have smaller sample sizes in 
most replications of this hypothetical study. Moreover, equal sample 
sizes are uncommon in most research studies. In the present 
example, if one parent with a ranking of (5,4,3,2,1) and thus an 
individual L equal to 37 was added, the resultant L would equal 220 
and Ld would have an absolute value equal to 3 seemingly not of 
sufficient magnitude to be "significant ' in comparison to the previous 
LD- Yet, there appears to be quite a disparity between the priorities 
of the school board members and the parents although the Ld does 
not reflect this. To elucidate, if six school board members were to 
exactly agree with the theoretical a priori ordering of the 5 
objectives, the resultant L would be 330. If eight concerned parents 
were also to exactly agree with these ordered hypotheses, the result 
would be an L =» 440. Thus, Ld would equal 110 although every 
judge, regardless of background, agreed with the hypothesized 
ordering although there should be no differences detected. Thus, 
larger groups would have larger L's by virtue of size rather than 
linearity of ranks. One way to ciromivent this problem is to scale 
the two L's to the same metric. In the present paper, it is proposed 
to use an 'averaged L" to test differences with the following null 
hypothesis: _ _ 

Ho: Lb=|Li-L2|-0 (5) 
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Li = k. Lz = 

where, and . 

Page ( 1963) showed that the L has the following mean: 

4 (6) 
In scaling this L with a division by n the expected value for the 
"averaged L" is as follows: 

E(Li) = Mkjil 

4 (7). 

Thus, the following can be elaborated: 

E(Lb) = E(L 1 - Lz) = 111^(^+1)^ . n^k(k-H)^ ^ ^ 

4ni 4n2 (8). 

From the third example. Lb = 330/6 - 440/8 = 55 - 55 = 0. Thus, 
there are no differences in the groups' rankings although their 
sample sizes differ. From the second example (Table 1, Panel C), Lb = 
217/4 - 220/5 = 10.25, indicating that differences do indeed exist. 
More importantly, this formulation holds for both equal and unequal 
sample sizes and is symmetric aroimd zero. 

In addition, both Page (1963) and Leitner and Dayton (1976) 
developed unit normal approximations for larger samples. Page 
(1963) showed that the L test has the following variance: 

c,2(L) - nk^k+l)^k-l) 

144 (9). 
Therefore, the unit normal approximation of the L test is the sample 
L minus the expected value of L (Eq. 1.6) and divided by the 
variance of L (Eq. 1.9) wdiich reduces to: 



12L - 3nk(k+l)2 



k(k+l)^^(FlT (10) 
For the Leitner and Dayton Ld test, the two separate Ls have the 
same expected value and therefore cancel out Thus, the absolute 
difference between the L's is divided by a denominator based on the 
standard deviation of the differences in L's. Assuming the two 
groups to be independent, Ld should have a variance equal to o2(Li) 
+ a2(L2) which, with equal number of judges in each group, becomes 
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two times the variance defined in equation 12. Thus the Ld can be 
formulated as: 

z^,JLi-L2i on 

k(k+l) V n(k-l) (11). 
As previously mentioned, the development of an analogous test for 
unequal sample sizes the difference between L's was redefined in 
terms of Li/ni and L2/n2. Thus, Lb has the following variance: 

o^de) = a2(Li - Lz) = o\Ti)+o^^2) (12). 
Using equation 1.9 and the property of multiplying a constant to the 
variance of a set yields: 

_ nik^(k-Kl)^(k-l) _ k^(k+l)^(k-l) 

144 n? 144 nj ^^3^ 

Thus for two unequally sized groups, a2(LB) reduces to: 



o2(LB)=a2(Li -L2) = (J-+J-) 



k^(k+l)2(k-l) 



n2 L 144 J (14) 
and the tmit normal approximation is: 



(15) 



Zi^ l^iLi^zl 



k(k+i)y(k-i)(j^+X) 



which is equivalent to the Leitner and Dayton test (Eq. 1.11) when 
the sample sizes are equal, hi the present example (Table 1 
including Panel C), the result, zlb = 3.06, is statistically significant at 
the .05 alpha level, assimiing a unit normal approximation. 

Tests Related to L 

Since Page's L test an ordered hypothesis of monotonic trends in 
data, it also tests the linear trend in ranks. Given linear effects, the L 
test is also analogous to regression analyses. Since most rank tests 
are considered to be alternatives to parametric procedm-es based on 
the General linear Model, there are several nonparametric tests 
related to Page's L test. 

Single Group Tests . Similar to the k by n layout analyzed by 
Page's L, the Friedman two-way ANOVA by ranks (Friedman, 1937) 
tests a null hypothesis that the k matched samples (or repeated 
measures) have been drawn from the same population. The 
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alternative hypothesis, however, is more general than that of the L 
test and states that at least one pair among the k repeated measures 
has different medians. The Friedman statistic is based on Kendall's 
coefficient of concordance (Kendall & Babbington-Smith, 1937) which 
expresses the degree of association among k sets of rankings for n 
judges. The Friedman test is formulated as such: 



121; t (Tlj)' 
j=>l i-l 



3n(k+l) 

. nk(k+l) J (16), 
where Tjj is the individual rank given from 1 to k for ith judges. 
This statistic approximates a chi-squared distribution with k-1 
degrees of freedom and differs from Page's L in that the column 
totals are squared rather than multiplied by some a priori coefficient 

ranging from 1 to k (see Eq. 1.3 for comparison). Thus the Friedman 

2 

Xf, used for testing the significance of the differences among 

treatments, is closely related to an omnibus F-tests of the treatment 

(repeated measures) sum of squares (SSt) in the parametric split plot 

2 

design. Indeed, it can be shown that Xf is equivalent to: 
12 SSt 

^ k(k+l) (17). 
Similarly, the square of the L test xmit normal approximation (Eq. 
1.10 squared) can be proved equivalent to: 

zl = Xl = — 1-2 — X (linear contribution of SSj) 

l^(k+l) (18). 

Given this restriction of linearity across the k alternatives, the L test 
is related to correlation and regression (Page, 1963). Lyerly (1952) 
proposed a statistic called average rho, r, which was equivalent to 
the average Spearman rank-order correlation in a set of n judges. 
The equivalence of Page's L to the Lyerly and Spearman statistics can 
be demonstrated. Page (1963) showed that L was equivalent to r: 

nk(k^-l) (k-l) (19) 
One variant of Page's L test has been proposed in order to better 
separate the rankings in middle of the distribution (Azzam et al., 
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1987). The B statistic was proposed because certain rankings which 
are distinct from each other receive the same individual L For 
example with k=4, a ranking of [1, 3, 4, 2] and a ranking of [2, 3, 1, 4] 
both receive an individual L of 27. Thus, although the rating process 
that underlies these rankings may be quite different, Page's L does 
not distinguish between the two. Therefore, squaring the a priori 
ordering coefficient was proposed such that: 

k 2 n 

B=i;ilTi 

i-1 H (20) 
This test has been shown to be generally as powerful as Page's L 
(Azzam et al., 1987), but it has a rather complicated imit normal 
approximation. Furthermore, the B statistic involves squaring the 
ordering coefficient; thus, agreement to the latter alternatives is 
given more weight and is interpreted as more important than 
agre'?ment to the first alternatives. Because of the recency of this 
test along with the implicit weightings and differentiations that are 
made, its efficacy in practical situations has not been assessed. 

Hollander (1967) proposed a rank procedure for testing ordered 
alternatives which has since been classified as an A-type test 
because the rankings are taken among blocks. This differs from 
Page's L, a W-t>pe test, in which the rankings are taken within a 
block (Pirie, 1974). This A-type procedure hivolves ranking the 
differences among blocks then summing the ranks to form the Y 
statistic. Hollander's Y is not distribution free, but it is 
asymptotically normal with the following expected value and 
variance: 

^/Y> - t^(^-l)n(n+l) 

' ■ 12 (21) 

^ _ n(n+l)(2n+l)(3k-2)po"(F) 

144 (22) 
where p5(F) is a factor derived from the sampled distribution, F. The 
values for this factor is reported in Hollander (1967). The 
asymptotic efficiency of Y relative to the j;-test is greater than .864 
for all distribution functions, F, and aU numbers of alternatives (k). 
When F is normal, the Asymptotic Relative Efficiency (ARE) with 
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respect to the t-test range from .963 to an upper limit of .989 as k 
approaches infmity. Therefore, Y outperforms Page's L under these 
conditions (Hollander, 1967). However, when F is a uniform or 
exponential distribution function, ARE values are at least as likely to 
favor Page's L as Hollander's Y. In fact, which test performs better 
apparently depends on the sampled distribution and the values of k 
and n. hi any case, the differences in power cannot be expected to 
be great. Thus, Page's L and W-type tests, in general, are favored 
over Hollander's Y (and other A-type tests) specifically because W- 
type tests are: a), distribution free; b). more able to control Type I 
error rates; and c). easily computed (Pirie, 1974). 

Multiple Group Tests . Jonkheere (1954) proposed a test of 
ordered alternatives for k independent samples. This statistic tests 
the null hypothesis that the medians are the same across groups 
(samples). The alternative hypotheses is that the medians are 
ordered in magnitude with at least one strict inequality. This 
procedure uses a Mann-Whitney count method and is; based on the 
average Kendall rank-order correlation (Kendall's tau) between the 
observed ranking of the i^^^ judge and the a priori ordering. 
Hollander (1967) showed that the asymptotic power of Jonkheere's 
test as compared to the j-test is similar to that of Page's L with an 
ARE of .694 for k»3 and as an ARE of .955 as k approaches infinity. 
Furthermore, since Jonkheere's test is based on Kendall's rank-order 
coefficient. Page's L can be shown to be more powerful because of 
the L's within-subject design. As compared to Kendall's tau, Page's L 
has an ARE equal to: 

ARE(Llx)=i^^^^-til 
2(k+l)^ 

which reaches its maximum at k=5 and never falls below zero. 

The Schucany and Frawley (1973) model is based on Page's L and 
is designed to test differential concordance in terms of the 
correlation between ranks of k alternatives assigned by two 
independent groups of judges. The statistic takes the product of the 
two separate rankings totaled over judges and then sums over the k 
alternatives: 



li 
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s = X RijR2j 

H (23). 
A unit nornial approximation was formulated as such: 

^„ _ 12S - 3nin2k(k+l)^ 

k(k+l)Vnin2(k-l) (24). 
Unlike the test presentiy proposed which has a null hypothesis of no 
between group differences in L, the Schucany and Frawley model has 
been criticized because it tests the null hypothesis that there is no 
concordance (Serlin & Marascuilo, 1983). Thus, a between group 
concordance of disconcordance with the a priori ordering is a possible 
result. Serlin and Marascuilo (1983) point out, "It is hard to conceive 
of concordance between groups when there is no evidence that there 
is concordance within groups" (p. 194). 

Hollander and Sethuraman ( 1978) have also questioned the 
Schucany and Frawley model because it tests for positive rank-order 
correlation in the mean ranks as an alternative. As an alternative, 
Hollander and Sethuraman provided a two group procedure which 
tests the identity of mean ranks across the k alternatives as the null 
hypothesis. Serlin and Marascuilo (1983) extended this method for 
multi-group situations and also developed planned and post-hoc 
comparison procedures. These multiple comparison procedures are 
capable of testing group differences at each of the k levels of 
treatment alternatives and are thus analogous to simple main effects 
in the ANOVA. Although the computation of these comparison 
procedures are relatively simple, the omnibus test for the Hollander 
and Sethuraman as well as the Serlin and Marascuilo fonnulations 
are based on multivariate procediu-es and are rather complex. 

Since Page's L is a test of the linearity of ranks, power analyses 
will proceed by generating rankings around different linear and 
monotonic effects. Then a comparison of the Lb tests to the mixed 
design ANOVA with linear and "double-ends" monotonic (Gaito, 
1965) interaction contrasts will be completed. The relative power of 
the Lb tests are not expected to exceed the parametric tests in many 
cases because within rankirg and within-group variances will 
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initially be icept homogeneous. However, the well-known effects of 
violating the homogeneity of variance assumption in combination 
with imequal sample sizes (Glass, Peckham, & Sanders, 1972) could 
become an issue. This is possible since nonparametric analyses have 
shown to be more powerful (or robust) under such violations 
(Boneau, 1962). 

Methods 

The distributions of L were generated for three alternatives 
(k=3) from a sample size of 2 (n=2) to as large as a sample size of 
n=10. For k=4, the generated distributions of L ranged from n=2 to 
n=8 and for k=5, from n=2 to n=6. For the two group situation all 
possible pairwise differences in L's (Lb distributions) were generated 
within a given number of alternatives. The critical values at the .10, 
.05, and .01 alpha levels were determined by finding the point in the 
Lb distribution that did not exceed the 90th, 95th or 99th percentile, 
respectively. These critical values were tabled along with the p- 
value derived from the zlb to show the approximation of this 
statistic. To demonstrate the fit of the two-tailed unit normal 
approximation (zlb), the ogive of the actual Lb distributions could be 
compared the ogive of theoretical distribution it is supposed to 
approximate (i.e., x2(i)) and subsequentiy the Kolmogorov-Smimov 

(K-S) test could be used to test the fit of these statistic. However, it 
is important to note that the sample size for a given distribution is 
k!N, where N is the total sample size, so that the K-S test will be 
highly sensitive to minor deviations at any point in the distribution. 
Furthermore, in using the K-S test, fit as a null hypothesis can only 
be falsified and thus retention of such a null hypothesis does not 
prove the approximation of the statistic. Moreover, since significance 
testing in general uses camulative proportions of theoretical or actual 
distributions to establish critical regions for the rejection of null 
hypotheses, the fit at the upper end of the ogives is of more practical 
interest. Therefore, to examine the fit of these statistics at the upper 
end in context with the disparity of sample sizes, the difference 
between the actual ciunulative proportion above the Lb critical value 
and the p-value of the approximate statistic (zlb) at the .10, .05, and 
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.01 alpha levels are plotted as a function of the ratio of the lai gest to 
total sample size. 

To examine the effects of violating the parametric normality 
assumption on the Type I error rate, data for 4 alternatives with no 
difference between the L's of two equally sized groups of n=8 (no 
trend interaction for the ANOVA) were randomly generated from a 
normal, uniform and exponential distribution and replicated 1,000 
times. The rejection rates of the zlb and the ANOVA are compared at 
the .041 alpha level since this is the actual proportin above the 
critical value for k=4 and eqtial n's of 8 (see Table 3). The cell 
parameters were [2, 3, 4, 1] for both groups so that there is no 
expected interaction and both groups have the same expected value 
for L, E(L)=24. In sampling from the normal distribution, data were 
generated aroimd these cell parameters with equal cell variances, 
o^(y,c)=4. Using the same parameter seeds for genei^adng the 
uniformly and exponentially distributed data, the expected values of 
the cell parameters change only by a constant (i.e., E(x) + 1/2 for the 
imiform distribution and E(x) + a(wc) for the exponential distribution) 
so that the expected ranks, E(L), and the expected differences in 
means are ot changed. In using the exponential random generator 
the variance is not affected, but the distribution becomes positively 
skewed, while the variance for the uniform distribution is changed so 
that the within cell variance is a^(yfc)/l2 (i.e., 3 in this case) for the 
imiform distribution (Freund & Walpole, 1987). 

To examine the effects of heterogeneity of variance with unequal 
sample sizes on the Type I error rate at the three levels of 
alternatives, normally distributed data with no differences in the 2 
L's (same as the previous parameters) were randomly generated at 
differing ratios of variances and sample sizes. With the parameter 
average within-ceU variance held constant at 9, the within-cell 
variance ratio of Group 1 to Group 2 are examined at .2 (3/15), .33 
(4.5/13.5), 1 (9/9), 3 (13.5/4,5). and 5 (15/3). With total sample size 
held constant at 20, largest (Group 1) to total sample size ratios are 
examined at .5 (nj =10; n2=l0), .7 (ni =14; n2=6), .8 (nj =16; n2=4), and 
.9 (ni =18; nz=2). All levels of the sample size and variance ratios 
were crossed and 1,000 repHcations in each cell were performed for 
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both the ZLB and the ANOVA. For the .05 alpha level, deviations 
from 50 (5%) rejections will indicate the "conservative" or "liberal'' 
nature resultant from the violations of this assumption. 

To analyze powder, the two group test for 4 alternatives was 
compared to linear polynomial and "double-ends" monotonic (Gaito, 
1965) hiteraction contrasts with a 2x4 mixed ANOVA. Two basic 
situations are presented. In one case cell means were randomly 
generated around the following equally spaced parameters which is 
analogous to having interval level data: 

Group 1 [1,3,4,2] E(L)=27 Equally Spaced 

Group 2 [3, ,2, 4, 1] E(L)=23 Parameters 

In this case the rankings of the data and the original data itself 
would yield the same results in any parametric procediu-e since with 
equally spaced parameters rankings are simply a linear 
transformation of the data. But in ranking a set of alternatives, the 
process underlying this ranking procedure may not be based on 
equally spaced parameters or the data simply may not be measured 
at an interval level. Therefore, data with the same expected values 
of L were generated aroimd the following unequally spaced 
parameters: 

Group 1 [5^10, 12, 7] E(L)=27 Unequally Spaced 
Group 2 [10, 9, 21, 8] E(L)=23 Parameters 

Thus, the expected value of differences in L's are same in both case, 
E(Lb)=4. In both cases, the distributions are sampled from the 
normal, uniform, and exponential distributions, v\^e the respective 
cell parameters are held constant. The effects of increased sample 
size on power are examine by using n=4, n=8, n=»16, and n=32. In the 
analysis of these effects when the data are sampled from a normal 
distribution, a'^ (wc) was held constant at 7.51 for the equally spaced 
parameters and at 1.55 for the imequally spaced parameters. These 
two values were used because a a2(^^,)-i7.55 gives the linear 
interaction contrast for the equally spaced parameters with an n=4 a 



Two Group Test for Ordered Alternatives 

15 

Non-Centrality Parameter ,NCP) equal to the test's .05 critical value, 
NCP( 1,18)=4.41. A o2(wc) =1.55 does the same for tlie monotonic 
interaction contrast of the unequally spaced parameters. In keeping 
the same generation seeds, the uniformly distributed equally spaced 
parameters have o2(wc)=.625, while for the unequally spaced 
parameters o^iy^(.)=.\29. 

Results 

Tables 2,3, and 4 show the exact critical values, the actual 
ciunulative proportion above the critical value, and the p-values of 
the unit normal approximate at nominal alpha levels of ,10, .05, and 
.01 for 3, 4, and 5 alternatives, respectively. To examine the fit of 
the large sample approximate (zlb), the difTerence between the 
actual ciunulative proportion above the critical value and the p-value 
of tlie approximate statistic at the .10, .05, .01 alpha levels are 
plotted as a function of the ratio of the largest to total sample size. 
Figure 1 (first panel) shows that for 3 alternatives the values are all 
negative ranging from about -.02 to 0 at the .10 alpha level uiiich is 
reflected by the theoretical distribution having higher ordinates than 
the actual distributions. At the .05 level, the fit becomes better with 
both positive and negative values basically centered around zero. At 
the .01 level, the values are positive ranging from 0 to .01 which is 
reflected by the actual distribution going above the theoretical y^^(xy 
The same basic results can be seen for 4 and 5 alternatives (Fig. 1; 
second and third panels), but the differences between the theoretical 
and actual cumulative proportions are smaller, demonstrating that 
the approximation is closer as the nimiber of alternatives increases. 
Another interesting effect is the linear relationship between these 
distributional differences and sample size ratio most notable at .05 
and .01 alpha level for k=3. Although the approximation of the 
distribution can be used to describe these effects, it can also be 
explained in terms of the well-known heterogeneity of variance 
effects. That is, in the formula for within-cell variance (Eq. 13) 
larger samples by definition have smaller variances. Also since the 
distributional differences were calculated as the actual cumulative 
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proportion minus the theoretical cumulative proportion, positive 
values mean that the p-value calculated from zlb is less than the 
actual cumulative proportion below that particular difference in 
mean L's which in practice would lead to more rejections. Therefore, 
positive values can be viewed as indicating more "liberal" tests while 
negative values reflect more "conservative'' rates of rejection. Thus 
the positive relationship between the distributional differences and 
sample size ratio at these particular alpha levels indicates that as the 
disparity between group sample sizes increase and the larger sample 
by definition has less within cell variance, the test becomes more 
"liberal" which is consistent with the effects of heterogeneity of 
variance. 

For a 2 X 4 design (collapsed to the Lb test for 4 alternatives), 
Figure 2 shows the proportions of rejections under null conditions for 
Lb and for linear and monotonic interactions contrasts when the data 
are sampled from the normal, uniform, and exponential distributions. 
Across all conditions, Lb commits fewer Type I errors than either of 
the parametric tests, but it also makes fewer than the expected 
number of false positives. Thus, the test is somewhat "conservative". 
When sampled from the normal distribution and especially the 
uniform distribution, the results reflect the conservative nature of 
Lb. By contrast the three different tests show similar results when 
the data are sampled from the exponential distribution. 

Figure 3 replicates the effects of heterogeneity of variance in 
combination with disparate sample sizes and shows that zlb is 
robust to such violations. In Figure 3, the horizontal dashed line is at 
the alpha level of .05. Major deviations from this line show the 
effects of unequal samples sizes and variances. The vertical dashed 
line in each panel is at 1.0 where the wi thin-cell variances are equa. 
for the two groups and unequal cell frequencies should have minor 
effects. In the upper left panel of Figure 3, the largest to total 
sample size ratio is .5, the situation in which the sample sizes are 
equal. At this point all three tests keep the nominal alpha level of 
•05, but as expected, as the largest to total sample size ratio increases 
heterogeneity of variance affects the rejection rates of the 
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parametric tests. For the parametric tests, when the larger sample 
has the smaller variance, the tests are more liberal and the Type I 
error rate increases to as high as 33% (Fig, 3, lower right panel). 
When the large sample has the smaller variance the rejection rates of 
the parametric tests approach zero. For, zlb, however, the rejection 
rates range from .075 to ,01 which given the conservative nature of 
this test falls within sampling error. Thus, the proposed test are 
robust to the violation of the heterogeneity of variance assumption in 
normally distributed data. 

Figure 4 shows the power of zlb relative to linear and monotonic 
interaction contrasts imder various conditions. As expected, since 
equally spaced parameters are analogous to having interval level 
data, the parametric tests dominate, especially when the data are 
sample from the normal distribution. However, given a moderate 
sample (cell sizes of n = 16) drawn from either the uniform or 
exponential distribution, the zlb has comparable power. When 
parameters are unequally spaced, linear contrasts cannot compete 
for two reasons. If the data are not interval level, linear test are not 
designed to readily detect their effects;however, if the data are 
interval level, but the parameters are not equally spaced, the use of 
linear test is a misspecification and therefore inappropriate. Thus, 
the monotonic interaction contrasts are definitely more powerful in 
these situations. Under the conditions that favor parametric tests 
(i.e., normally distributed data), the monotonic contrasts show more 
power, although zib becomes comparable around a sample size of 
n=32. When data are sample form the exponential distribution, zlb 
and the test of monotonic interaction are of comparable power at 
n=16. When data under these simulated conditions are sample from 
the uniform distribution, both tests are very powerful. 

Discussion 

These results demonstrate that the proposed procedure for 
testing differences in a priori monotonic trends or linear trends of 
ranks by testing the differences in Page's L approximates the 
commonly used unit normal and chi-square distributions, is robust to 
the violations most deleterious to parametric tests (i.e., heterogeneity 
of variance), and can be rather powerful under certain distributional 
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conditions. Furthermore, it makes less restrictive assumptions about 
the shape of the sampled distribution and is eas}' to compute. The 
results showed that as the number of treatments increase the fit of 
the unit normal approximation (zlb) becomes better, but even for 
three alternatives the p-value from zub is no more than 2% different 
than the proportion above the actual critical value. The 
performance of this test in the 4 alternative situation was compared 
to a 2 X 4 mixed design and was shown to make fewer Type I errors 
than its parametric competitors when data were randomly generated 
from the normal, uniform, and exponential distributions. This 
nonparametric test, however, was also shown to be generally more 
conservative, especially with uniformly distributed variables. In 
contrast to the parametric tests, zlb is not seemingly affected by 
unequally sized samples having unequal variances* Furthermore, zlb 
shows adequate power in conditions that do nor favor parametric 
tests (i.e., interval level data; normally distributed variables). 
Specifically, when data are sampled from either the uniform of 
exponential distribution, zib is of comparable power with as few as 
16 judges per group. 

Tills extension should prove beneficial in educational research for 
many reasons. First, well-planned educational research often 
involves directional and a priori hypotheses about multiple 
treatment effects. Secondly, applied educational research often 
involves smaller samples which are unlikely to be equal in size. 
Also, the dependent variables used to capture educational and 
psychological phenomena mostiy involve ordinal scale meas\irement 
which with moderate sample sizes are not technically amenable to 
most parametric procedures. 
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Table 1 . Hypothetical data for tests of ordered alternatives . 



Srhnol 


Objectives (k= 5) 


Board Member Ti 


T2 Ts 


T4 T5 


511 1 

512 1 

513 2 

514 1 


3 2 
2 3 

1 3 

2 3 


4 5 

5 4 
4 5 
4 5 


111=4 2Ti=5 rr2=8 rr3=ii 2T4=17 rr5=i9 

Li= 217 = (1)(5) + (2)(8) + (3)(11) + (4)(17) + (5)(19) 
Panel B. Hypothetical parent data 


Objectives (k= 5) 


Parent Ti 


T2 T3 


T4 Ts 


521 2 

522 3 

523 1 

524 3 


1 3 

4 5 

5 4 

2 5 


4 5 
1 2 
3 2 
1 4 


n2=4 2Ti=9 2T2=12 rr3=17 1X4=9 2T5=13 

L2 = 185 = (1)(9) + (2)(12) + (3)(17) + (4)(9) + (5)(13) 
Ld = (217 - 185) = 32 


From equation 11, 


= 32 J 


72 - 2.26 


5(5+1) 'V 


4(5-1) 


Panel C. Data for fifth oarent 




S25 5 


4 3 


2 1 



L2 = 220 = (1)(14) + (2)(16) + (3)(20) + (4)(11) + (5)(14) 
Lb = 217/4 -220/5 = 10.25 

, 12(10.25) _ , HA 

From equation 15, ^lb = ===== - 5.uo 

5(5+1) .»/(5-l)(l + l) 
V 4 5 
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Table 2 . Critical Values^ Actual Proportions of the Distribution, and Aioha from 
the Unit Normal Approximation for k» 3 
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Table 3 . Critical Values, Actual Proportions of the Distribution, and 
Aloha from the Unit Normal Approximation for k« 4 
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Table 4 . Critical Values, Actual Ptoportions of che Distribution^ and Alpha from 

the Unit Normal ADProximation for k» 5 
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Figure 1 . Null hypothesis rejection rates under nonnal, uniform, and 
exponential sampling conditions for Lb, linear, and mono conic 
interaction contrasts. 
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Fi gure 3. Null hypothesis rejection rates for zlb, linear, and 
monotonic interaction contrasts as a function of varinace and sample 
size ratios. 



l.O' 
i» 0.8 

s 

I 0.6 
u 

a 

M 

0.2 
0,0 



Equally Spaced Parameters Unequally Spaced Parameters 













Monotonic 




- 


Monotonic y 


/ 

/ 






y Normal / y 


ZLB 




X Distribution / 




- 




/ / 
/ / 
t / 
/ / 
/ / 

/ / 
















Linear^ 




^^^^^^^^ 








1 ' 1 " 1 • 1 


1 ' I 







rl.O 



0 



LOT 



^ 0.6H 

Vi 

0.4- 

i 

M 

0.2. 



10 20 
Monotonic 



0,0- 



I 

10 



■nr- 

20 



Monotonic 



O 

13 0.6- 

u 

0.4- 

w 

0.2- 



Linear 




30 



40 0 



10 



20 



30 




Monotonic & ZLB 

Linear 



Uniform 
Distribution 



30 



\ ' I 

40 0 



10 



20 



Monotomc / 

Exponential y 
Distribution / 

/ / ZLB 

/ 
/ 
/ 




■0.6 



-0.4 



■0.0 
40 

•LO 
-0.8 
-0.6 
0.4 
hO.2 
0.0 



30 40 



1.0 
0.8 
-0.6 
-0.4 

0,2 



O.OH » 1 » — '"I ' I ' r^n » 1 » 1 1 »■ 

0 10 20 30 40 0 10 20 30 40 

Within Group Sample Size (n) Within Group Sample Size (n) 



0.0 



Fi gure 4 . Power as a function of sample size and distribution. 



